human intuition
Human in the Latent Loop (HILL): Interactively Guiding Model Training Through Human Intuition
Geissler, Daniel, Krupp, Lars, Banwari, Vishal, Habusch, David, Zhou, Bo, Lukowicz, Paul, Karolus, Jakob
Latent space representations are critical for understanding and improving the behavior of machine learning models, yet they often remain obscure and intricate. Understanding and exploring the latent space has the potential to contribute valuable human intuition and expertise about respective domains. In this work, we present HILL, an interactive framework allowing users to incorporate human intuition into the model training by interactively reshaping latent space representations. The modifications are infused into the model training loop via a novel approach inspired by knowledge distillation, treating the user's modifications as a teacher to guide the model in reshaping its intrinsic latent representation. The process allows the model to converge more effectively and overcome inefficiencies, as well as provide beneficial insights to the user. We evaluated HILL in a user study tasking participants to train an optimal model, closely observing the employed strategies. The results demonstrated that human-guided latent space modifications enhance model performance while maintaining generalization, yet also revealing the risks of including user biases. Our work introduces a novel human-AI interaction paradigm that infuses human intuition into model training and critically examines the impact of human intervention on training strategies and potential biases.
AutoSciLab: A Self-Driving Laboratory For Interpretable Scientific Discovery
Desai, Saaketh, Addamane, Sadhvikas, Tsao, Jeffrey Y., Brener, Igal, Swiler, Laura P., Dingreville, Remi, Iyer, Prasad P.
Advances in robotic control and sensing have propelled the rise of automated scientific laboratories capable of high-throughput experiments. However, automated scientific laboratories are currently limited by human intuition in their ability to efficiently design and interpret experiments in high-dimensional spaces, throttling scientific discovery. We present AutoSciLab, a machine learning framework for driving autonomous scientific experiments, forming a surrogate researcher purposed for scientific discovery in high-dimensional spaces. AutoSciLab autonomously follows the scientific method in four steps: (i) generating high-dimensional experiments (x \in R^D) using a variational autoencoder (ii) selecting optimal experiments by forming hypotheses using active learning (iii) distilling the experimental results to discover relevant low-dimensional latent variables (z \in R^d, with d << D) with a 'directional autoencoder' and (iv) learning a human interpretable equation connecting the discovered latent variables with a quantity of interest (y = f(z)), using a neural network equation learner. We validate the generalizability of AutoSciLab by rediscovering a) the principles of projectile motion and b) the phase transitions within the spin-states of the Ising model (NP-hard problem). Applying our framework to an open-ended nanophotonics challenge, AutoSciLab uncovers a fundamentally novel method for directing incoherent light emission that surpasses the current state-of-the-art (Iyer et al. 2023b, 2020).
Task-Aware Robotic Grasping by evaluating Quality Diversity Solutions through Foundation Models
Appius, Aurel X., Garrabe, Emiland, Helenon, Francois, Khoramshahi, Mahdi, Doncieux, Stephane
Task-aware robotic grasping is a challenging problem that requires the integration of semantic understanding and geometric reasoning. Traditional grasp planning approaches focus on stable or feasible grasps, often disregarding the specific tasks the robot needs to accomplish. This paper proposes a novel framework that leverages Large Language Models (LLMs) and Quality Diversity (QD) algorithms to enable zero-shot task-conditioned grasp selection. The framework segments objects into meaningful subparts and labels each subpart semantically, creating structured representations that can be used to prompt an LLM. By coupling semantic and geometric representations of an object's structure, the LLM's knowledge about tasks and which parts to grasp can be applied in the physical world. The QD-generated grasp archive provides a diverse set of grasps, allowing us to select the most suitable grasp based on the task. We evaluate the proposed method on a subset of the YCB dataset, where a Franka Emika robot is assigned to perform various actions based on object-specific task requirements. We created a ground truth by conducting a survey with six participants to determine the best grasp region for each task-object combination according to human intuition. The model was evaluated on 12 different objects across 4--7 object-specific tasks, achieving a weighted intersection over union (IoU) of 76.4% when compared to the survey data.
Roles of LLMs in the Overall Mental Architecture
To better understand existing LLMs, we may examine the human mental (cognitive/psychological) architecture, and its components and structures. Based on psychological, philosophical, and cognitive science literatures, it is argued that, within the human mental architecture, existing LLMs correspond well with implicit mental processes (intuition, instinct, and so on). However, beyond such implicit processes, explicit processes (with better symbolic capabilities) are also present within the human mental architecture, judging from psychological, philosophical, and cognitive science literatures. Various theoretical and empirical issues and questions in this regard are explored. Furthermore, it is argued that existing dual-process computational cognitive architectures (models of the human cognitive/psychological architecture) provide usable frameworks for fundamentally enhancing LLMs by introducing dual processes (both implicit and explicit) and, in the meantime, can also be enhanced by LLMs. The results are synergistic combinations (in several different senses simultaneously).
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
Joshi, Amogh, Kosta, Adarsh Kumar, Roy, Kaushik
The ability of neural networks to perform robotic perception and control tasks such as depth and optical flow estimation, simultaneous localization and mapping (SLAM), and automatic control has led to their widespread adoption in recent years. Deep Reinforcement Learning has been used extensively in these settings, as it does not have the unsustainable training costs associated with supervised learning. However, DeepRL suffers from poor sample efficiency, i.e., it requires a large number of environmental interactions to converge to an acceptable solution. Modern RL algorithms such as Deep Q Learning and Soft Actor-Critic attempt to remedy this shortcoming but can not provide the explainability required in applications such as autonomous robotics. Humans intuitively understand the long-time-horizon sequential tasks common in robotics. Properly using such intuition can make RL policies more explainable while enhancing their sample efficiency. In this work, we propose SHIRE, a novel framework for encoding human intuition using Probabilistic Graphical Models (PGMs) and using it in the Deep RL training pipeline to enhance sample efficiency. Our framework achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost. Additionally, by teaching RL agents the encoded elementary behavior, SHIRE enhances policy explainability. A real-world demonstration further highlights the efficacy of policies trained using our framework.
Effective Generative AI: The Human-Algorithm Centaur
Saghafian, Soroush, Idan, Lihi
Advanced analytics science methods have enabled combining the power of artificial and human intelligence, creating \textit{centaurs} that allow superior decision-making. Centaurs are hybrid human-algorithm AI models that combine both formal analytics and human intuition in a symbiotic manner within their learning and reasoning process. We argue that the future of AI development and use in many domains needs to focus on centaurs as opposed to traditional AI approaches. This paradigm shift from traditional AI methods to centaur-based AI methods raises some fundamental questions: How are centaurs different from traditional human-in-the-loop methods? What are the most effective methods for creating centaurs? When should centaurs be used, and when should the lead be given to traditional AI models? Doesn't the incorporation of human intuition -- which at times can be misleading -- in centaurs' decision-making process degrade its performance compared to traditional AI methods? This work aims to address these fundamental questions, focusing on recent advancements in generative AI, and especially in Large Language Models (LLMs), as a main case study to illustrate centaurs' critical essentiality to future AI endeavors.
GROOViST: A Metric for Grounding Objects in Visual Storytelling
Surikuchi, Aditya K, Pezzelle, Sandro, Fernández, Raquel
A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding. In this work, we focus on evaluating the degree of grounding, that is, the extent to which a story is about the entities shown in the images. We analyze current metrics, both designed for this purpose and for general vision-text alignment. Given their observed shortcomings, we propose a novel evaluation tool, GROOViST, that accounts for cross-modal dependencies, temporal misalignments (the fact that the order in which entities appear in the story and the image sequence may not match), and human intuitions on visual grounding. An additional advantage of GROOViST is its modular design, where the contribution of each component can be assessed and interpreted individually.
Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations
Chen, Valerie, Liao, Q. Vera, Vaughan, Jennifer Wortman, Bansal, Gagan
AI explanations are often mentioned as a way to improve human-AI decision-making, but empirical studies have not found consistent evidence of explanations' effectiveness and, on the contrary, suggest that they can increase overreliance when the AI system is wrong. While many factors may affect reliance on AI support, one important factor is how decision-makers reconcile their own intuition -- beliefs or heuristics, based on prior knowledge, experience, or pattern recognition, used to make judgments -- with the information provided by the AI system to determine when to override AI predictions. We conduct a think-aloud, mixed-methods study with two explanation types (feature- and example-based) for two prediction tasks to explore how decision-makers' intuition affects their use of AI predictions and explanations, and ultimately their choice of when to rely on AI. Our results identify three types of intuition involved in reasoning about AI predictions and explanations: intuition about the task outcome, features, and AI limitations. Building on these, we summarize three observed pathways for decision-makers to apply their own intuition and override AI predictions. We use these pathways to explain why (1) the feature-based explanations we used did not improve participants' decision outcomes and increased their overreliance on AI, and (2) the example-based explanations we used improved decision-makers' performance over feature-based explanations and helped achieve complementary human-AI performance. Overall, our work identifies directions for further development of AI decision-support systems and explanation methods that help decision-makers effectively apply their intuition to achieve appropriate reliance on AI.
Probabilistic RRT Connect with intermediate goal selection for online planning of autonomous vehicles
Patel, Darshit, Eskandarian, Azim
Rapidly Exploring Random Trees (RRT) is one of the most widely used algorithms for motion planning in the field of robotics. To reduce the exploration time, RRT-Connect was introduced where two trees are simultaneously formed and eventually connected. Probabilistic RRT used the concept of position probability map to introduce goal biasing for faster convergence. In this paper, we propose a modified method to combine the pRRT and RRT-Connect techniques and obtain a feasible trajectory around the obstacles quickly. Instead of forming a single tree from the start point to the destination point, intermediate goal points are selected around the obstacles. Multiple trees are formed to connect the start, destination, and intermediate goal points. These partial trees are eventually connected to form an overall safe path around the obstacles. The obtained path is tracked using an MPC + Stanley controller which results in a trajectory with control commands at each time step. The trajectories generated by the proposed methods are more optimal and in accordance with human intuition. The algorithm is compared with the standard RRT and pRRT for studying its relative performance.
The age of AI diplomacy
We've long known that computers can beat us at chess, so does it matter if they have started to beat us at more verbal and collaborative games such as Diplomacy? It certainly does, and suggests a future in which artificial intelligence may begin to play a growing role in the whole spectrum of international affairs, from crafting communiqués to solving disputes and analysing intelligence briefings. Diplomacy, a strategic board game that was a favourite of both Henry Kissinger and John F. Kennedy, is set in Europe before the first world war. The objective is to gain control of at least half the board by negotiating alliances via private one-to-one conversations. There are no binding agreements, so players can misrepresent their plans and double-deal.